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Abstract 

We propose a model selection approach for covariance estimation of a multi- 
dimensional stochastic process. Under very general assumptions, observing i.i.d 
replications of the process at fixed observation points, we construct an estimator of 
the covariance function by expanding the process onto a collection of basis functions. 
We study the non asymptotic property of this estimate and give a tractable way of 
selecting the best estimator among a possible set of candidates. The optimality of 
the procedure is proved via an oracle inequality which warrants that the best model 
is selected. 
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1 Introduction 

Covariance estimation is a fundamental issue in inference for stochastic processes with 
many applications, ranging from hydroscience, geostatistics, financial series or epidemiol- 
ogy for instance (we refer to |Ste99] , |Jou77] or |Cre93j for general references for applica- 
tions). Parametric methods have been extensively studied in the statistical literature (see 
|Cre93j for a review) while nonparametric procedure have received a growing attention 
along the last decades. One of the main issue in this framework is to impose that the 
estimator is also a covariance function, preventing the direct use of usual nonparametric 
statistical methods. In this paper, we propose to use a model selection procedure to con- 
struct a nonparametric estimator of the covariance function of a stochastic process under 
general assumptions for the process. In particular we will not assume Gaussianity nor 
stationarity. 

Consider a stochastic process X(t) with values in M, indexed by t G T, a sub- 
set of M'^, d E N. Throughout the paper, we assume that X has finite covariance 
a {s, t) = cov {X (s) ,X{t)) < +00 for all s,t E T and, for sake of simplicity, zero mean 
E {X (t)) = for all t E T. The observations are Xi [tj) for i = 1, ...,N, j = 1, ...,n, where 
the observation points ti,...,t„ G T are fixed, and Xi,...,Xiy are independent copies of 
the process X. Our aim is to build a nonparametric estimator of its covariance. 

Functional approximations of the processes Xi,...,Xn from data {Xi(tj)) are involved 
in covariance function estimation. When dealing with functional data analysis (see, e.g., 
|RS05] ). smoothing the processes Xi,...,Xn- is sometimes carried out as a first step before 
computing the empirical covariance such as spline interpolation for example (see for in- 
stance in |ETA03j ) or projection onto a general finite basis. Let x,; = (Xj (ti ),..., Xj (tn))'^ 
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be the vector of observations at the points ti, ...,tn with i G {1, ...,N} . Let {g\}xeM 
a collection of (usually linearly independent but not always) functions gx : T —>■ where 
Ai denote a generic countable set of indices. Then, let (m) C be a subset of indices of 
size m e N and define the nxm matrix G with entries gjx = g\{tj), j = 1, n, A G (m). 
G will be called the design matrix corresponding to the set of basis functions indexed by 
(m). 

In such setting, usual covariance estimation is a two-step procedure: first, for each 
i = 1, A^, fit the regression model 

Xi = Gai + ei (1.1) 

(by least squares or regularized least squares), where are random vectors in R", to 
obtain estimates = {ai^x)x(^(m} G of a^ where in the case of standard least squares 
estimation (assuming for simplicity that G"^G is invertible) 

a, = (G^G)-^G^Xi,i = 1,...,N. 

Then, estimation of the covariance is given by computing the following estimate 

S = G*G^, (1.2) 

where 

* = ^E^^^^ = (G^G)-'G^ G(G^G)-\ (1.3) 

i=l \ 1=1 J 

This corresponds to approximate the process Xj by a truncated process Xj defined as 

Xi{t) = J2 hx9xit),t = l,...,N, 

Xe{m) 

and to choose the empirical covariance of X as an estimator of the covariance of X, defined 
by 

1 ^ _ _ 

i=l 

In this paper we propose to view the estimator (11. 2p as the covariance obtained by 
considering a least squares estimator in the following matrix regression model 

x,xf = G*G^ + Ui, 2 = l,...,iV, (1.4) 

where ^' is a symmetric matrix and Uj are i.i.d matrix errors. Fitting the models fll.lj) 
and (11.41) by least squares naturally leads to the definition of different contrast and risk 
functions as the estimation is not performed in the same space (M™ for model (11.11) and 
j^rrtxm model (II. 4p ). By choosing an appropriate loss function, least squares estimation 
in model (11.41) also leads to the natural estimate (11.21) derived from least square estimation 
in model (II. ip . However, the problem of model selection, i.e. choosing an appropriate 
data-based subset of indices (m) G A^, is very distinct in model (II. ip and model (II. 4p . 
Indeed, model selection for (II. ip depends on the variability of the vectors Xj's while for 
(II. 4p it depends on the variability of the matrices Xjxf 's. One of the main contributions 
of this paper is to show that considering model (11.41) enables to handle a large variety 
of cases and to build an optimal model selection estimator of the covariance without too 
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strong assumptions on the model. Moreover it will be shown that considering model 
(11 .4^ leads to the estimator ^ (11.31) which is guaranteed to be in the class of definite non 
negative matrices and thus to a proper covariance matrix S = G^'G"^. 

A similar method has been developed for smooth interpolation of covariance functions 
in |BJG95] . However, this paper is restricted to basis functions that are determined by 
reproducing kernels in suitable Hilbert spaces. Furthermore, a matrix metric different 
from (though related to) the Frobenius matrix norm is adopted as a fitting criterion. 
Similar ideas are tackled in |MP08j . These authors deal with the estimation of S within 
the covariance class T = G^G^ induced by an orthogonal wavelet expansion. However, 
their fitting criterion is not general since they choose the Gaussian likelihood as a contrast 
function, and thus their method requires specific distributional assumptions. We also 
point out that computation of the Gaussian likelihood requires inversion of G*G^, which 
is not directly feasible if rank (G) < n or some diagonal entities of the definite non 
negative (d.n.n) matrix ^ are zero. 

Hence, to our knowledge, no previous work has proposed to use the matrix regression 
model (II. 4p under general moments assumptions of the process X using a general basis 
expansion for nonparametric covariance function estimation. 

The paper then falls into the following parts. The description of the statistical frame- 
work of the matrix regression is given in Section [21 Section 2 is devoted to the main 
statistical results. Namely we study the behavior of the estimator for a fixed model in 
Section 2.1 while Section 2.2 deals with the model selection procedure and provide the 
oracle inequality. Section 3 states a concentration inequality that is used in all the paper, 
while the proofs are postponed to a technical Appendix . 

2 Nonparametric Model selection for Covariance es- 
timation 

Recall that X = {X {t))^^rp is an M- valued stochastic process, where T denotes some 
subset of W^, c? G N. Assume that X has finite moments up to order 4, and zero mean, 
i.e E(X(t)) = for all t G T. The covariance function of X is denoted by a{s,t) = 
cov {X (s) , X (t)) for s,t G T and recall that Xi,...,Xn are independent copies of the 
process X. 

In this work, we observe at different observation points ti, t„ G T these independent 
copies of the process, denoted by Xi(tj), with i = 1,...,N, j = l,...,n. Recall that 
Xj= {Xi {ti) , Xj (tn)) is the vector of observations at the points ti, t„ for each i = 
1,...,N. The matrix S =E (xjxf ) = itj,'tk))i<^j<^ni<k<n covariance matrix of 

X at the observations points. Let x and S denote the sample mean and the sample 
covariance (non corrected by the mean) of the data xi, ...,X7v, i.e. 



Our aim is to build a model selection estimator of the covariance of the process observed 
with N replications but without additional assumptions such as stationarity nor Gaus- 
sianity. The asymptotics will be taken with respect to A^, the number of copies of the 
process. 




1=1 



i=l 
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2.1 Notations and preliminary definitions 

First, define specific matricial notations. We refer to |Lut96j or [KvROSj for definitions 
and properties of matrix operations and special matrices. As usual, vectors in M.^ are 
regarded as column vectors for all A; G N. To be able to write general methods for all 
our models, we will treat matricial data as a natural extension of the vectorial data, 
with of course, different correlation structure. For this, we introduce a natural linear 
transformation, which converts any matrix into a column vector. The vectorization of 
a k X n matrix A = (ajj)i^j^A;,i^j^ra is the kn x 1 column vector denoted by fee (A), 
obtain by stacking the columns of the matrix A on top of one another. That is vec{A) = 

[ail, O-kl, 0-12, 0'k2i fllra, O-kn]'^ ■ 

For a symmetric k x k matrix A, the vector fee (A) contains more information than 
necessary, since the matrix is completely determined by the lower triangular portion, 
that is, the k{k + l)/2 entries on and below the main diagonal. Hence, we introduce 
the symmetrized vectorization, which corresponds to a half-vectorization, denoted by 
vech{A). More precisely, for any matrix A = (aij)i^i^/c,i^j^fc, define vech{A) as the k{k + 
l)/2 X 1 column vector obtained by vectorizing only the lower triangular part of A. That 
is vech{A) = [an, afci, a22, a„2, a(fc_i)(fc-i), a(^k-i)k, akkV- There exist unique linear 
transformation which transforms the half-vectorization of a matrix to its vectorization and 
vice-versa called, respectively, the duplication matrix and the elimination matrix. For any 
k E N, the k'^ X k{k + l) /2 duplication matrix is denoted by D^, Ik = (1, 1)^ e 
and Ifc is the identity matrix in M.^^^. 

For any matrix A, A"^ is the transpose of A, tr (A) is the trace of A, ||A|| is the 
Frobenius matrix norm defined as || A||^ = tr (AA"^) , Amax (A) is the maximum eigenvalue 
of A, p (A) is the spectral norm of A, that is p (A) = Amax (A) for A a d.n.n matrix. If 
A = {0'ij)i^i^k,i^j^n is a X n matrix and B =ibij)i^i^p,i^j^q is a p x q matrix, then the 
Kronecker product of the two matrices, denoted by A ® B, is the kp x nq block matrix 



A(8)B 



aiiB . . . ai„B 



OfciB . . . afc„B 



For any random matrix Z = (^ji)i<j<jt i<j<„) its expectation is denoted by E (Z) = 
(E (%))i^i^fc,Ki^n- Fo'^ any random vector ^ = {Zi)^^^^^, let V (z) = (eof (Z^, Zj))^^^ .^^ 
be its covariance matrix. With this notation, V (xi) = V^(xj) = (cr(tj,tfc))^ 



IS 



the covariance matrix of X. 

Let (m) G and recall that to the finite set Qm = {gx}\(zi^rn) functions gx : T ^ M. 
we associate the nx m matrix G with entries gj\ = gx{tj), j = l,...,n, A G (m). 
Furthermore, for each t G T, we write = {gx (t) , A G (m)) . For k E N, Sk denotes 
the linear subspace of M'^^'^ composed of symmetric matrices. For G gM"^™', S (G) is the 
linear subspace of M"^" defined by 

5(G) = {G*G'^:*g5^}. 

Let Sn (G) be the linear subspace of M"^^"- defined by 

Sn (G) = {1n® G*G^ : * g5„} = {In^T -.T eS (G)} 
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and let Vn (G) be the hnear subspace of R" ^ defined by 

Vn{G) = {In ^ vec (G^G^) : * eSm] = {l7v®^^ec(r) : T g5(G)}. 

All these spaces are regarded as Euclidean spaces with the scalar product associated to 
the Frobenius matrix norm. 

2.2 Model 

The approach that we will develop to estimate the covariance function cr is based on the 
following two main ingredients: first, we consider a functional expansion X to approximate 
the underlying process X and take the covariance of X as an approximation of the true 
covariance E. 

For this, let (m) G Ai and consider an approximation to the process X of the following 
form: 

X{t)= J2 «A^7A(t), (2.1) 

A6(m) 

where ax are suitable random coefficients. For instance if X takes its values in LF'{T) (the 
space of square integrable real-valued functions on T) and if {gx)xeM are orthonormal 
functions in L'^{T), then one can take 

ax= [ X{t)gx{t)dt. 
Jt 

Several basis can thus be considered, such as a polynomial basis on M"^, Fourier expansion 
on a rectangle T C M*^ (i.e. gx (t) = e*^'^^'^^'*\ using a regular grid of discrete set of 
frequencies [ux G M*^, A G (m)} that do not depend on One can also use, as 

in |ETA03j . tensorial product of B-splines on a rectangle T C M'^, with a regular grid 
of nodes in M'^ not depending on ti, ...,tn or a standard wavelet basis on R'^, depending 
on a regular grid of locations in R"' and discrete scales in R_(_. Another class of natural 
expansion is provided by Karhunen-Loeve expansion of the process X (see |Adl90j for 
more references). 

Therefore, it is natural to consider the covariance function p of X as an approximation 
of a. Since the covariance p can be written as 

p(3,t) = Gf*G„ (2.2) 

where, after reindexing the functions if necessary, Gt = {gx (t) , A G (rn))'^ and 

* = (E (axa^)) , with (A, p) G (m) x (m). 

Hence we are led to look for an estimate a of a in the class of functions of the form 
(12. 2p . with ^ G R™^™- some symmetric matrix. Note that the choice of the function 
expansion in fl2.ip . in particular the choice the subset of indices (m), will be crucial in 
the approximation properties of the covariance function p. This estimation procedure has 
several advantages: it will be shown that an appropriate choice of loss function leads to 
the construction of symmetric d.n.n matrix (see Proposition [SH]) and thus the resulting 
estimate 

a(s,t) = Gf*G„ 
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is a covariance function, so the resulting estimator can be plugged in other procedures 
which requires working with a covariance function. We also point out that the large 
amount of existing approaches for function approximation of the type (12.11) (such as those 
based on Fourier, wavelets, kernel, splines or radial functions) provides great flexibility to 
the model (Q- 

Secondly, we use the Frobenius matrix norm to quantify the risk of the covariance ma- 
trix estimators. Recall that S = (a (tj, tk))i^j ^<;„ is the true covariance while T = {p (tj, tk))(^j 

will denote be the covariance matrix of the approximated process X at the observation 
points. Hence 

r = G*"G^. (2.3) 

Comparing the covariance function p with the true one a over the design points tj, implies 
quantifying the deviation of F from S. For this consider the following loss function 

L(*) = E||xx^ - G^G^lf , 

where x= (X (ti) , ...,X (t„))^ and ||.|| is the Frobenius matrix norm. Note that 

L(*) = ||S-G*G^||VC, 

where the constant C does not depend on ^. The Frobenius matrix norm provides a 
meaningful metric for comparing covariance matrices, widely used in multivariate analysis, 
in particular in the theory on principal components analysis. See also |BR97j . |SS05j and 
references therein for other applications of this loss function. 

To the loss L corresponds the following empirical contrast function Lj^, which will be 
the fitting criterion we will try to minimize 

1 ^ 

^^W = ^Ell-^-^-G*G^|f. 



N 

i=l 

We point out that this loss is exactly the sum of the squares of the residuals corresponding 
to the matrix linear regression model 

x,xf = G*G^ + U„ ^ = l,...,iV, (2.4) 

with i.i.d. matrix errors Uj such that E (Uj) = 0. This remark provides a natural 
framework to study the covariance estimation problem as a matricial regression model. 
Note also that the set of matrices G^G"^ is a linear subspace of M"^" when ^ ranges 
over the space of symmetric matrices Sm- 

To summarize our approach, we finally propose following two-step estimation proce- 
dure: in a first step, for a given design matrix G, define 

^ = arg min L]\f(^), 

and take S = G^'G"^ as an estimator of S. Note that ^ will be shown to be a d.n.n 
matrix (see Proposition 13.11) and thus I] is also a d.n.n matrix. Since the minimization 
of Ln with respect to ^ is done over the linear space of symmetric matrices Sm, it 
can be transformed to a classical least squares linear problem, and the computation of 
^ is therefore quite simple. For a given design matrix G, we will construct an estimator 
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for r = G^'G which will be close to S = ^ (xi) as soon as X is a sharp estimation of 
X. So, the role of G and thus the choice of the subset of indices (m) is crucial since it 
determines the behavior of the estimator. 

Hence, in second step, we aim at selecting the best design matrix G = G^ among a 
collection of candidates {Gm, {fn) € For this, methods and results from the theory of 
model selection in linear regression can be applied to the present context. In particular the 
results in [BarOO] , [ComOl] or |LL08j will be useful in dealing with model selection for the 
framework (12.41) . Note that only assumptions about moments, not specific distributions 
of the data, are involved in the estimation procedure. 

Remark 2.1. We consider here a least-squares estimates of the covariance. Note that suit- 
able regularization terms or constraints could also be incorporated into the minimization 
of Ljv (^') to impose desired properties for the resulting estimator, such as smoothness or 
sparsity conditions as in |LRZ08j . 



3 Oracle inequality for Covariance Estimation 

The first part of this section describes the properties of the least squares estimator S = 
G^G"^ while the second part builds a selection procedure to pick automatically the best 
estimate among a collection of candidates. 



3.1 Least Squares Covariance Estimation 

Given some nxm fixed design matrix G associated to a finite family of m basis functions, 
the least squares covariance estimator of S is defined by 

E = G*G^ = argmin|^^||xixf -r|f : T = G*G^,* e5^| . (3.1) 

The corresponding estimator of the covariance function a is 

a{s,t) = Gj^Gt. (3.2) 

Proposition 3.1. Let Yi,...,YAr e M"^" and G eM"^™ be arbitrary matrices Then, the 
infimum 



inf 1^ ^ II Y, - G^G^lf : * G^^j 



is achieved at 

$ = (G^G) ' G^ { ^^^ j G (G^G) " , (3.3) 

where (G"^G) is any generalized inverse of G^G (see fEHNOffj for a general definition), 
and 

i=l 

^ T — 

Furthermore, G^'G is the same for all the generalized inverses (G"^G) of G"^G. In 
particular, z/Yi,...,Yjv G Sn (i.e., if they are symmetric matrices) then any minimizer 
has the form 

$= (G^G)"G^YG (G^G)". 
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//Yi,...,YAr are d.n.n. then these matrices ^ are d.n.n. 

If we assume that (G^^G)^^ exists, then Proposition 13.11 shows that we retrieve the 
expression (11 .Sp for ^ that has been derived from least square estimation in model f ll.ip . 

Theorem 3.2. Let S =-^Yl!i=i^i^ ■ Then, the least squares covariance estimate defined 
by (EHP is given by the d.n.n. matrix 

s = G$G^ = nsn, 

where 

* = (G^G)' G^SG (G^G)~ , (3.4) 
n = G(G^G)~G^. 

Moreover S has the following interpretations in terms of orthogonal projections: 

i) S is the projection of S & M"^" on S (G). 

ii) In i^T, is the projection ofY = (xixf, ...,XArx^) G R'^-^^" on Sn (G) . 

Hi) In ® vec is the projection of y = {yec^ (xi^f) ; ■■■■,vec^ (x^rx^))^ G R"^^ 
on Vn (G) . 

The proof of this theorem is a direct application of Proposition 13. 1[ Hence for a given 
design matrix G, the least squares estimator 1] = S(G) is well defined and has the struc- 
ture of a covariance matrix. It remains to study how to pick automatically the estimate 
when dealing with a collection of design matrices coming from several approximation 
choices for the random process X. 



3.2 Main Result 



Consider a collection of indices (m) G M. with size m. Let also {Gm : {fn) G TW} be a 



finite family of design matrices G^ G 



and let 



S(G, 



(m) G A^, be the 



corresponding least squares covariance estimators. The problem of interest is to select 

^ 2 

the best of these estimators in the sense of the minimal quadratic risk E S — Hm 

The main theorem of this section provides a non-asymptotic bound for the risk of a 
penalized strategy for this problem. For all (m) G M.^ write 



Ilm — Gm (G^Gm) G 

Drr, = Tr (11™) , 



T 
ml 



(3.5) 



We assume that Dm ^ 1 for all (m) G M.. The estimation error for a given model (m) G M. 
is given by 

/ 9\ 

|2 



E 



nm^nm I 



where 



+ 



52 D 



N 



(3.6) 



Tr((n, 



n„) $) 



<^=V {yec (xixf )) . 
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Given 6 > 0, define the penalized covariance estimator S = by 



m = arg mm 

{m)£M N 



1 ^ 

-y 



+ "pen (m) 



where 



pen \ m) 



(3.7) 



Theorem 3.3. Letq > 6e given such that there exists p > 2 (1 + g) satisfying W, ||xixf ||^ < 

oo. Then, for some constants K (6) > 1 and C {6,p, q) > we have that 



E 
where 



S - S 



K{e) inf 

{Tn)£M 



IS - n^sn 



,2 SlDr^ 



+ 



N 



"sup 



and 



= max {(5^ : (m) e A^} . 

/n particular, for q = 1 we have 



E 



S - S 



^ K (6) inf E 

(m)eA1 



5j — 



(3.8) 



For the proof of this resuh, we first restate this theorem in a a vectorized form which 
turns to be a d-variate extensions of results in jBarOO] (which are covered when d = 1) 
and are stated in Section 14.11 Their proof rely on model selection techniques and a 
concentration tool stated in Section 14.21 

Remark 3.4. The penalty depends on the quantity Sm- Note that 



D — 



E 



7^ (m,n) = Tr((n„®n„)*) 



(3.9) 



N = Tt V vec rE 



N. 



f$) for all (m). Hence Theorem 



remains true if 5^ is 



So, we get that 5^ ^ A, 
replaced by = Amax (^) in all the statements. 

Remark 3.5. The penalty relies thus on ^=V {yec (xixf)) . This quantity reflects the 
correlation structure of the data. We point out that for practical purpose, this quantity 
can be estimated using the empirical version of $ since the Xj, i = 1, . . . ,N are i.i.d 
observed random variables. In the original paper by Baraud |Bar02j . an estimator of 
the variance is proposed to overcome this issue. However, the consistency proof relies 
on a concentration inequality which turns to be a like inequality. Extending this 
inequality to our case would mean to be able to construct concentration bounds for 
matrices xx^, implying Wishart distributions. If some results exist in this framework 
|RMSE08] . adapting this kind of construction to our case falls beyond the scope of this 
paper. 



9 



We have obtained in Theorem 13.31 an oracle inequahty since, using fl3.6p and (13.81) . one 
immediately sees that S has the same quadratic risk as the "oracle" estimator except for 
an additive term of order O (^^) and a constant factor. Hence, the selection procedure 
is optimal in the sense that it behaves as if the true model were at hand. To describe 
the result in terms of rate of convergence, we have to pay a special attention to the 



bias terms III] — IlmSIIr 



In a very general framework, it is difficult to evaluate such 



approximation terms. If the process has bounded second moments, i.e for alH = 1, 
we have E (X^ (tj)) ^ C, then we can write 



S — IlmSII^ 



i=l i'=l 



^ 2C2n2-^E(x(t,)-X(t,))" 
1=1 



Since n is fixed and the asymptotics are given with respect to A^, the number of replications 
of the process, the rate of convergence relies on the quadratic error of the expansion of 
the process. 

For example take d = 1, T = [a,b], A4 = A4]\f = {(m) = {!,..., m}, m = 1, . . . , N}, 
and for a process X (t) with t G [a, &]], consider its Karhunen-Loeve expansion (see for 
instance |Adl90j ). i.e. write 

oo 

x{t) = J2Zxgx (t), 

A=l 

where Zx are centered random variables with E (Z|) = 7I, where 7^ is the eigenvalue 

b 

corresponding to the eigenfunction gx of the operator (Kf) (t) = J a (s, t) f (s) ds. If 

a 

X [t) is a Gaussian process then the random variables Zx are Gaussian and stochastically 
independent. Hence, a natural approximation of X {t) is given by 

m 

X{t) = J2 Zxgx {t) . 



X=l 



So we have that 



E{X{t)-X{t) 



E 



E ^>^9xit) 



^A=m+l 



00 

E 

A=m+1 



ligi it) 



therefore, if \\gx\ 



L2i[a,b]) 



1 then E X (t) - X (t) = £ 7^ . Assume that the 

L2{[a,b]) l=m+l 

7a's have a polynomial decay of rate a > 0, namely 7a ~ A~", then we get an approxi- 
mation error of order O ((m + . Hence, we get that (under appropriate conditions 
on the design points ti, . . . , t„) 



0((m + l)-'"). 



Finally, since in this example E 



^ K{e) inf 



+ 



N 



+ 



O (-^) then the quadratic risk is of order N 2q+i as soon as m ~ A^i/(2"+i) belongs to the 
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collection of models A^at. In another framework, if we consider a spline expansion, the 
rate of convergence for the approximation given in |ETA03j are of the same order. 

Hence we have obtained a model selection procedure which enables to recover the 
best covariance model among a given collection. This method works without strong 
assumptions on the process, in particular stationarity is not assumed, but at the expand 
of necessary i.i.d observations of the process at the same points. However the range of 
applications in broad, especially in geophysics or epidemiology. 



4 Model Selection for Multidimensional Regression 

4.1 Oracle Inequality for multidimensional regression model 

Recall that we consider the following model 

x,xf = G*G^ + Ui, z = l,...,N, 

with i.i.d. matrix errors Uj, E (Uj) = 0. This model can be equivalently rewritten in 
vectorized form in the following way 

y = A/5 + u, 

where y is a data vector, E (u) = 0, A is a known fixed matrix, and /3 =vech (^) is an 
unknown vector parameter. It is worth of noting that this regression model has several 
peculiarities in comparison with standard ones. 

i) The error u has a specific correlation structure, namely Itv®^, where ^ = V {yec (xjxf )) . 

ii) In contrast with standard multivariate models, each coordinate of y depends on all 
the coordinates of 13. 

Hi) For any estimator S = G^G^ that be a linear function of the sample covariance S 
of the data xi,...,X7v (and so, in particular, for the estimator minimizing L^) it is possible 

to construct an unbiased estimator of its quadratic risk E S— S 

Assume we observe yj, i = 1, . . . , iV random vectors of R'^ such that 

y, = r+£„ z = l,...,Ar, (4.1) 

where PgM*^ are nonrandom and ei, are i.i.d. random vectors in with E {si) = 
and V {ei) = For sake of simplicity, we identify the function g : X ^ 'R'^ with vectors 

{g (xi) . . . g {xjy)) G M^'^ and we denote by (a, 6)^ = ;^ S ^I^iy with a = (oi . . . a^) 

i=l 

and ttj G M"', the inner product of M^'^ associated to the norm ||.||jy. 

Given N,d eN, let {^m)(^rn)eM ^ finite family of linear subspaces of M^"^. For each 
(m) G Ai, assume Cm has dimension Dm ^ 1- For each (m) G Ai, let be the least 
squares estimator of f = ^(f^)^ , (f^)^j based on the data y = (yi, yAr) under the 
model Cm] i-e., 

im = arg mill {||y - v||^} = P„y, 
where is the projector matrix from on Cm- Write 



^2 Tr {Pm {In ® *)) 



^Inp = max {5^ -.meM] . 
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Given 6 > 0, define the penalized estimator f = , where 



with 



m = arg mm 

{m)€M 



pen [m) 



( 




2 ~> 


{ 


y-fm 


+ pen (m) > 






N J 



'1 



Proposition 4.1. ; Lei g > 5e (^wen swc/i i/iai i/iere exists p > 2 (1 + g) satisfying 
E ll^ill^ < oo. Then, for some constants K (6) > 1 and c{6,p,q) > we have that 

q 



E 



f-f 



N 



K{e)M* 



A g "sup 



(4.2) 



where 



c(e,p,g)Eii.ir ( J] 

\mGA4 / 



inf 

[m)eM 



P f n 



2 _^ ^mDm 



N 



This theorem is equivalent to Theorem 13.31 using the vectorized version of the model 
(14.1 p and turns to be an extension of Theorem 3.1 in |BarOOj to the multivariate case. In 
a similar way, the following result constitutes also a natural extension of Corollary 3.1 in 
|BarOO] . It is also closely related to the recent work in |Gen08] . 
Corollary 4.2. . Under the assumptions of Proposition it holds that 







2q\ 


(e 


f-f 








N 1 



^ 2 



K{e) inf 



If-P™f| 



K^Drn 

N 



where Ap was defined in Proposition (14. ip . 

Under regularity assumptions for the function f , depending on a smoothness parameter 
s, the bias term is of order 

||f-P^ff = 0(D-2^). 

2s 

Hence, for g = 1 we obtain the usual rate of convergence N 2s+i for the quadratic risk as 
soon as the optimal choice Dm = A^^i+t belongs to the collection of models, yielding the 
optimal rate of convergence for the penalized estimator. 



4.2 Concentration Bound for multidimensional random process 

These results are d-variate extensions of results in |BarOOj (which are covered when d = 1). 
Their proofs are deferred to the Appendix. 

Proposition 4.3. (Extension of Corollary 5.1 in WarOO^ ). Given N,d E N, let A E 

j^AfrfxAfd^^ |Q j ij^an.n.d. matrix and Si, ...,8^ i.i.d random vectors in 'R'^ with E,{6i) = 

and V (si) = Write e = (ef , ...,el;f, ( (e) = Ve^Ae, and 7^ = Tr (X (I^ ® $)) = 
^A^ . For all p ^ 2 such that E ||£:i ||^ < 00 it holds that, for all x > 

/ \ E||£ifTr(A) 

(e) ^ 5^Ti { A) + 25\ Ti (a) 5x + 5^Ti (a) x ] ^ C (p) . . ^ \ (4.3) 

V / V V / V / / 6Pp(A)xP/^ 



where the constant C (p) depends only on p. 
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Proposition 14.31 reduces to Corollary 5.1 in [BarOOj when when we only consider d = 1, 
in which case 5^ = = cr^ is the variance of the univariate i.i.d. errors Si. 

5 Appendix 

5.1 Proofs of Preliminar results 

Proof of Proposition 13.11 

Proof, a) The minimization problem posed in this theorem is equivalent to minimize 

/if*) = llY- G*G^ 



iT||2 



The Frobenius norm ||.|| is invariant by the vec operation. Furthermore,* &<Sm can 
be represented by means of 6 =vec (*) = BgP where p eM^^^+^^Z^^ '^Yiese facts and the 
identity 

vec (ABC) = (C^ ® A) vec (B) (5.1) 

allow one to rewrite 

/i(*) = ||y-(G®G)D,/5||% 

where y = vec (Y) . Minimization of this quadratic function with respect to P in M'J('?+i)/2 
is equivalent to solve the normal equation 

Dj^ (G ® G)"" (G ® G) D,/5 = (G ® G)"^ y. 

By using the identities 

D^wec (A) = vech (A + A^ - diag (A)) 
and I5.lt said normal equation can be rewritten 

vech (G^G (* + G'^G-diag (G^G*G^G)) = vech (g^ (y + Y^) g) . 

Finally, it can be verified that * given by (13.31) satisfies this equation as a consequence 
of the fact that such * it holds that 

G^G^G^G = vech ^G^ \^ j ^ 

b) It straightforwardly follows from part a) . □ 

5.2 Proofs of Main Results 

Proof of Proposition (14.11) 

Proof. The proof follows the guidelines of the proof in |BarOOj . More generally we will 
prove that for any 77 > and any sequence of positive numbers Lm, if the penalty function 
pen : M. — > M+ is chosen to satisfy: 

pen (m) = (1 + + L^) for all (m) G M, (5.2) 
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then for each x > and p ^ 2 



^ \ // / (m)GA4 



1 Drr,yl 



where we have set 



f-f 



N 



2 - 



7] J (m)eA4 



^ Oni {LmDm + X 



inf {d% {f,Crn) + pen{m)] 



lP/2' 



+ 



To obtain (14. 2p . take = f = -^^m- As for each (m) G A^, 



(5.3) 



< (f , + pen (m) ^ d% (f , + (1 + ^) -fD. 

51 



we get that for all g > 0, 



W (f ) ^ 



f-f 



iV 



2 + - ) + 



f-f 



N 



-K{9)M* 



where K (6) = (2 + |) (1 
Since 



oo 

E (Ti:^ (f)) = J qu'^-^F {H (f) > u) du, 



we derive from (15.41) and (15. 3p that for all p > 2 (1 + q) 



E 



f-f 



N 



^Ein" (f)) 



Eikiir 



E 



m 



9-1 



(5.4) 



^ c (p, g, ^) 



6^" 



qx 



i^,r,, V 1 



p/2 



A 1 



dx 



using that P (7^ (f) > m) ^ 1. 

Indeed, for m G such that ^ 1, using that g — 1 — p/2 < 0, we get the following 
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bounds 

oo 



529 

m 



qx 



q-1 



Dray I 



(f + x) 



p/2 



A 1 



dx ^ 6X6;^^ / gx'^-i 



Dr 



{iDm + x) 



p/2 



dx 



§2q g-p 
sup m 




g-1 



Dr 



(iDm + x) 

-Dm 



p/2 



dx + I qx' 

D. 



q-l 



ilDrn + x)' 



p/2 



dx 



qx^ 



00 






^dx + Dm j qx''~^ 


1 


dx^ 


Dm 







4';?pC" I r/'e-^^'Dl~^/' I qx'^-'dx + Dml 



qx 



,~l-p/2^^ 



X 



p/2 -g 
(1 



Jf2g r-p / T~,-{p/2-l-q) 



p/2-g 



.p/2 - q 

(5.5) 



fl5.5p enables to conclude that fl4.2l) holds assuming fl5.3l) . 



We now turn to the proof of fl5.3l) . Recall that, we identify the function g : X —>■ 
with vectors {g (x^) . . . g {xj,)f G M^'^ and we define the empirical scalar product 

as {a,b)j^ = with a = {ai...a]y) and G M'^, the inner product of M^*^ 

associated to the norm ||.||^. For each (m) G we denote by the orthogonal 
projector onto the linear space |(^ (xi) . ..g{xN)f : g G C R^'^. This linear space 
is also denoted by Cm- From now on, the subscript m denotes any minimizer of the 
function m' ||f — Pm'f||^ + pen{m'), (m') E M.n- For any g G R'^'^ we define the 
least-squares loss function by 

7iv(g) = l|y-gll^ 

Using the definition of we have that for all g G R^'^, 



7Jv(g) 



N ■ 



Then we derive that 



|f-g||^ = 77v(f) + 2(f-y, e)j, + \\ e\ 



N 



and therefore 



f -f 



N 



|f-P„f||^ = 7Ar(f) -77v(Pmf)+2(f-P^f, 6 



N 



(5.6) 



By the definition of f , we know that 

7Ar ( f ) + pen (m) ^ •jn (g) + pen (m) 
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for all (m) G and for all g G Cm- Then 



7Ar (^f j - 7Ar (P^f) ^ pen (m) - pen (m) . 
So we get from fl5.6p and (15.71) that 



(5.7) 



Bm' = {g e : llglljv ^ 1} 
p^'f-f 



Gm' = sup (g, 







if ||P„,f-f||^^0 
otherwise. 



f - f ^ ^ ||f - P^f ll^+pen {m)-pen (m)+2 (f - P„f , e)j^+2 (P^f - f , £)jv+2 (f - P^f , ej^ , 

(5.8) 

In the following we set for each (m') G A4, 



Since f = P^ f+ P.^ e, (15. 8p gives 



f-f 



N 



^ ||f — P^f 11^ + pen (m) — pen (m) 



+ 2 ||f - P„f 11^ |(u„, 6)j,\ + 2 ||f - PaflU |(ua, £);vl + 2G^- 



(5.9) 



Using repeatedly the following elementary inequality that holds for all positive num- 
bers a, X, z 

1 



2xz ^ ax'^ H — 
a 



we get for any m' E Ai 



2,11/ \ |2 



2 ||f - Pm'f|| |(u„/, e)j^\ ^ « ||f - Pm'f|| + - |(u„/, e)j^ 

a 



By Pythagoras Theorem we have 

2 



f-f 



N 



If _ p^fir + 

I ■■■ -■- m'- II jv ^ 



P™f - f 



N 



— ||f PjTtf lljv ~^ ^fh- 

We derive from (15. 9p and (15. lip that for any a > 0: 



f-f 



1 



^ ||f - Pmf 11^ + a ||f - Pmf 11^ + - (Um, 

N a 



N 



(5.10) 



(5.11) 



(5.12) 



+a ||f - Pmf 11^ + - (ua, e)% + 26*1 + pen (m) - pen (m) 

a 



Now taking into account that by equation (I5.12p ||f — P^f || 
above inequality is equivalent to: 



f-f 



N 



- Gl the 



(1-a) 



f-f 



' ^ (l + a)||f-P„f||^ + -(u^,£)' 

N a 



N 
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+ — {ufn, e)lr + (2 - a) + pen (m) - pen (m) . (5.13) 
a 

We choose a = G ]0, 1[, but for sake of simplicity we keep using the notation 
a. Let pi and p2 be two functions depending on rj mapping Ad into They will be 
specified later to satisfy 



pen {m!) ^ (2 — a)pi {m) H — p2 {m') V(m') G M. 

a 

Since ^p2 {rn') ^ pen {m') and 1 + a ^ 2, we get from (15.131) and (I5.14p that 



(5.14) 



(1- 



a 



f -f 



N 



^ (1 + a) ||f - P^f 11^ + pen (m) + ^p^ (m) + (2 - a) (G^ - pi (m)) 

+ - ((ua, e:)^ - P2 {m)) + - ((u^m, - P2 ("^)) 
a a 

^ 2 (||f - P™f 11^ + pen (m)) + (2 - «) - p^ (m)) 

+ - ((ua, e)\ - p2 (m)) + - Uum, e)% - p2 (m)) . (5.15) 
a a 



As = 2 + - we obtain that 

1— a 77 

(l-a)7i:(f) = |(i-«) f-f 

= \ {l-a) f-f 



^ - (1 - «) (2 + ^) Jnf^ (||f - P^'f 11^ + pen (m')) 



^-2(||f-P„f||^ + 2pen (m)) 
(2 - a) (G| - pi (m)) + ^ ((u^, e)?^ - P2 {m)) + ^ ((u^, e)% - p2 (m)) 



using that m minimizes the function ||f — Pm'||^ + pen (m') and (I5.15p . 
For any x > 0, 

P ((1 - a) H (f ) ^ ^) ^ P (3m' eM:{2-a) (G^, - p, (m')) ^ ^) 

/ 1 „X2 \ 



P (3m' e : ^ ((u„,,£)5, -p2 (m')) ^ 



3A^ 
s:2 



^ J] P [(2 - «) (IIP^.^II^ - pi (m')) ^ ^] 

+ E PQ((U.^.)^^-P2(m0)^i^) 
:= J] (x) + J2 ^2,m' (a;) • 

We first bound P2,m' (a^)- Let t be some positive number, 

P(|(u^,,£)^| ^t) ^t-^E(|(u^,,£)^n. (5.17) 



(5.16) 
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N 



Since {um',e)j^ = jiYl {'^im',£i) with Si i.i.d. and with zero mean, then by Rosenthal's 



i=l 



inequahty we know that for some constant c (p) that depends on p only 

N / N 

i=l \i=l 
N / N 

i=l \i=l 

N / N 

= E\\e,rYl ii^^'ii" + ii^iii') ' I E II"-' II' ) • (^-^^^ 

Since p^2, (E||£if )^ ^ (EH^if)" and 



1=1 \ i=l 

1 



(E||£if)^ ^ E||£if . (5.19) 

N 2 

Using also that by definition ||um'||^ = ;^ X] l|uim'|| = 1; then ''"']^''' ^ 1 and 

i=l 

therefore ^^""^'^^ ^ 1. Thus 

=n^^j:v^) =n^m%=n^^. (5.20) 

^=l i=l \ J \ N-2 J 

We deduce from ( Km . fCT|) and KM that 

c-^(p)ArPE|(u^,,e)^|P ^ E||£if ATi +E||£if iVi 

Then for some constant c' (p) that only depends on p 

E\{u^,,e)j,f^c'{p)E\\e,rN-'^. 

By this last inequality and f l5.17p we get that 

P(|(u^,,£)^| ^t) ^c'(p)E||£if iV-it-P. (5.21) 

Let V be some positive number depending on rj only to be chosen later. We take t 
such that Nt'^ = min (v, |) {L^'D^i + x) 5^, and set A''p2 {'^n') = vLm'Dm'6'^,. We get 



P2.m' (x) 



P(^((u™',.)i-P2(m'))^^) 
P (^iV {um',e)l ^ Np2 (m') + a^x^ 



^ P ^|(u„/,£)^| ^ 2 ^ min (^i;, ^ j a/ {L^'Dm' + x)6m'^ 
^c'{p)E\\eirN- 



(min (i;,f))^(L„,D„, +x)2 5s 



c" (P, r/) 'y" ^ ^. (5.22) 
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The last inequality holds using (15.211) . 

We now bound Pi,m' [x) for those m' E Ai such that Dm' ^ 1. By using our version 

of Corollary 5.1 in Baraud with A = Pm', Tr (^A^ = D^,' and p (^A^ = 1, we obtain from 

(14. 3 p that for any positive Xm' 



(5.23) 



Since for any /3 > 0, 2y/Dm'Xm' ^ PDm' + P ^x^' then (15.231) imply that 

E lie IP -£ 

P(iV||P„,£||^ ^ (1+/?)D^,C + (1 ^C(p)^^An'xJ. (5.24) 



Now for some number /3 depending on only to be chosen later, we take Xm' = 
(1 + /?-!) min (v, ^^^f^) {Lm'Dm' + x) and A^pi (m') = vL^'D^'Sl, + {I + P) D^'Sl,. 
By (1521!) this gives 

(2-«)-^xe 



Pi,„.(x) =P ||P„,£||^-pi(m') ^ 



3A^ 



,-1 



P ( N WPm'sWl > vLm'Dm'Sl, + (1 + /?) I^„.'C + f x6l, 
^ P (iV ||P„'£||^ ^ (1 + /3) Dm'6l, + (1 + 

, e (p) yj^I)..rJ < C ,p. ,) qi^^k.^. (5.25) 
Gathering (ESSD, (ESSl) and (1536D we get that 

^ I V m'eM m'eM 



< \^ c'(p r7)^M; ^^^^^^ 



P 



„^'eA1 ^m' {Lm'Dm'+X}^ 



P ■ 



Since = (1 + 2//"^), then (EJD holds: 
P (n (f) ^ (1 + 2r^-i) ^) ^ E max (D^,, 1) (c' (p, v) + c" {p, v)) 



. ,E||£if ^ An' VI 

It remains to choose P and 6 for (15.141) to hold (we recall that a = 2^)- This is the 
case if (2 — a) (1 + /3) = 1 + r] and (2 — a + a"^) 5 = 1, therefore we take /? = | and 



5 



1 _i_ 2 _i_ oillZZl 

2 (2+,,) 



□ 
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5.3 Proof of the concentration inequality 

Proof of Proposition (14.31) 

Proof. Denote by the following expression: 

r2 := E WPmsf = E (e^P^e) = Tr (P^ {1^ ® $)) . 
Then we have that 

= Tr (P„ {In ® $) P^^) ^ A^ax {In ® $) Tr (P^) = A^ax {In ® Tr (P, 

= Amax ($) Tr (P„) = Amax ($) Dm. 

We have that r^^ {e) := e^Ae, where A = A^A. Then 



r/^(e) = Pe|r 





2 


sup {Ae, u) 









sup (e, y4^u) 
llull^i 



sup y2 {Ae)- Ui 

2 



N 



sup y'(£:i,Afu) 



sup ^(ei, (^^u) ) 

N d 

^^p X] X] (^^") ' 



with A = {Ai I ... I A^v), where Ai is a (A^rf) x d matrix. 

N N , 

Now take Q = {g^ : (x) = Yl (^i, u) = Yl {Biy^, BiA^v) , u, x = (xi, . . . , xat) G 



||u|| ^ 1}. 

Let Mi = [0, ...,0,/rf,0, ...,0]' e Ri^'i)x{Nd)^ ^Yiere Id is the z-th block of Mi, 
Bi = [0, 0, 0, ...0] e M(^'^)x(^'^), 6i = Bie and e = [0, . . . , 0, e,, 0, . . . , 0]'. 
Then 



TV 



7] {e) = sup Y] gu {Mis) . 



Now take Uj = Mj e, e G M^^'^-'. Then for each positive number t and p > 

P {t] {e) ^ E (r/ {e)) + t) ^ P (|?7 {e) - E (r/ (e))] > t) 

^ t^^E (|r7 (e) - E (r] {e))\^) by Markov inequality 



^ c (p) <{ E I max^ sup | (e^, u) |^ ) + 



u|Ki 



E sup 5^((..,Afu»M 



c{p) (ei + e^/') 



(5.26) 



We start by bounding Ei. For all u such that ||u|| ^ 1 and i G {1, A^}, 



\Ajuf ^\\A^u\f ^p' {A) 



where p (M) = sup for all matrix M. For p ^ 2 we have that ||y4iu||^ ^ {A) \\Aiu\ 



then 



\{e.,Afu)\'^[M\\Afu\\Y^p^-^A) IhflKul 
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Therefore 

N 



El ^ {A) E sup V \\ei\f \\Aju\ 
Vll"ll=i i=i 
Since ||u|| ^ 1, = 1, 

\\Aju\f = u^AAju^p{AAj) 
< Tr (A^Af) , 

then 

iV N / ^ \ 

i=l i=l \i=l / 

Thus, 

El ^ /-2(A)Tr (l)E(||eif). (5.27) 

We now bound E2 via a truncation argument. Since for all u such that ||u|| ^ 1 and 
te{l,...,N}, \\A^u\\' ^ p^A), ior any positive number c to be specified later we have 
that 

E2 ^ E sup V \\eif \\Afuf l{\\e4^c} + E sup V \\eif \\Afu\\^ l{||e,|l>c} 



N \ / N 



^ E sup V ll^f u|| l{||£,i|^c} + E sup V \\e,\f \\AJu\\ l{\\e,\\>c} 
^ cV' {A) + c^-^E I sup V II Auf ||£,f 1 

^ cV (A) + c^-fE (||£,f ) Tr (I) (5.28) 

using the bound obtained for Ei. It remains to take d' = E (||ej||^) Tr (^A^ / [A) to get 
that: 

E2 ^ cV' {A) + cV' {A) = 2cV' {A) , 

therefore 

^ 2P/2cPpP (A) , (5.29) 

which implies that 

2-p/2E^/' ^ E (ll^if ) Tr (I) (A) . 
We straightforwardly derive from 05.261) that 



P [rf [e) ^ [E (r^ (e))]' + 2E (r^ (e)) t + t^) ^ c (p) t"*' (^Ei + E^^ 
Since [E (r/ (e))]^ ^ E (r/^ (e)), (1^:771) and flCTjl imply that 

P (rf {e) ^ E {rf (e)) + 2v/E (r/2 (e))t2 + ^ ^ ^-p j^^^ _^ 
^ c (p) (^) Tr (I) E (lle.f) + (H^if) Tr (l) {A)^ 

^ d ip) t-PpP-' (A) Tr (I) E (||£,f ) , (5.30) 



if 
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for alH > 0. Moreover 



E [t]^ (e)) = E (^e^Ae^ = E {\\Aef) = E WAsif^ 

N N 
i=l 2=1 

But it is better to use that 
E [r]^ (e)) = Tr (^Aee^^ = Tr (a{In ® $)) = Tr {A^A {In ® $)) = Tr {A {In ® $) A^) 
^ A^ax {In ® <f ) Tr (AA^) = A^ax (/tv ® Tr (l) = A^ax (Q) Tr (l) , 

(5.31) 

for Q = Jjv (g) 

Using flOTD . take = p (a) A^ax (/jv ® a; > in flOOD to get that 



P ( r/' (e) ^ A„,ax (Q) Tr (a) + 2^A„,ax (Q) Tr (a) p (a) A„,ax {Q) x + p (a) A^a. (Q) x 

^ c' {p) p-^/' (I) A^^a^' (Q) x~^'^r' {A) Tr (I) E (H^.f) . 
Since P\A \ = (? {Al) (with the Euchdean norm) the desired result follows: 



P r]^ {e) ^ A^ax (Q) Tr (a) + 2A^ax (Q) Wp (^) Tr [a) x + A^ax (Q) P [a) x 



IP Tr M 



^ C (p) . ^ " ^ ■ (5.32) 

(VAmax(Q)) p(Aja;P/2 

□ 
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